Biomediator Data Integration and Inference for Functional Annotation of Anonymous Sequences

نویسندگان

  • Eithon Cadag
  • Brenton Louie
  • Peter J. Myler
  • Peter Tarczy-Hornoch
چکیده

Scientists working on genomics projects are often faced with the difficult task of sifting through large amounts of biological information dispersed across various online data sources that are relevant to their area or organism of research. Gene annotation, the process of identifying the functional role of a possible gene, in particular has become increasingly more time-consuming and laborious to conduct as more genomes are sequenced and the number of candidate genes continues to increase at near-exponential pace; genes are left un-annotated, or worse, incorrectly annotated. Many groups have attempted to address the annotation backlog through automated annotation systems that are geared toward specific organisms, and which may thus not possess the necessary flexibility and scalability to annotate other genomes. In this paper, we present a method and framework which attempts to address problems inherent in manual and automatic annotation by coupling a data integration system, BioMediator, to an inference engine with the aim of elucidating functional annotations. The framework and heuristics developed are not specific to any particular genome. We validated the method with a set of randomly-selected annotated sequences from a variety of organisms. Preliminary results show that the hybrid data integration and inference approach generates functional annotations that are as good as or better than "gold standard" annotations approximately 80% of the time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Expression Array Annotation Using the BioMediator Biological Data Integration System and the BioConductor Analytic Platform

This paper presents the implementation of a model for expression array annotation (EAA) using the BioMediator biological data integration system along with BioConductor, an analytic tools platform. The model presented addresses the need for annotation sources identified during BioConductor inverted exclamation mark s development. Annotation provides us with well-curated genomic background knowl...

متن کامل

Determining the Feasibility and Value of Federated Data Integration with Combinations of Logical and Probabilistic Inference for SNP Annotation

Determining the Feasibility and Value of Federated Data Integration with Combinations of Logical and Probabilistic Inference for SNP Annotation Terry Hsin-Yi Shen Chair of the Supervisory Committee: Professor Peter Tarczy-Hornoch Department of Medical Education and Biomedical Health Informatics Most common and complex diseases are influenced at some level by variation in the genome. The future ...

متن کامل

Family Classification and Integrative Analysis for Protein Functional Annotation

The high-throughput genome projects have resulted in a rapid accumulation of predicted protein sequences, however, experimentally-verified information on protein function lags far behind. The common approach to inferring function of uncharacterized proteins based on sequence similarity to annotated proteins in sequence databases often results in over-identification, underidentification, or even...

متن کامل

The BioMediator System as a Data Integration Tool to Answer Diverse Biologic Queries

We present the BioMediator (www.biomediator.org) system and the process of executing queries on it. The system was designed as a tool for posing queries across semantically and syntactically heterogeneous data particularly in the biological arena. We use examples from researchers at the University of Washington, and the University of Missouri-Columbia, to discuss the BioMediator system architec...

متن کامل

FunCat functional inference with belief propagation and feature integration

Pairwise comparison of sequence data is intensively used for automated functional protein annotation, while graphical models emerge as promising candidates for an integration of various heterogeneous features. We designed a model, termed hRMN that integrates different genomic features and implemented a variant of belief propagation for functional annotation transfer. hRMN allows the assignment ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره   شماره 

صفحات  -

تاریخ انتشار 2007